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Abstract 

This is a preliminary article stating and proving a new maximum en- 
tropy theorem. The entropies that we consider can be used as measures 
of biodiversity. In that context, the question is: for a given collection 
of species, which frequency distribution(s) maximize the diversity? The 
theorem provides the answer. The chief surprise is that although we are 
dealing with not just a single entropy, but a one-parameter family of 
entropies, there is a single distribution maximizing all of them simultane- 
ously. 
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This article is preliminary. Little motivation or context is given, 
and the proofs arc probably not optimal. I hope to write this up 
in a more explanatory and polished way in due course. 
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1 Statement of the problem 



Basic definitions Fix an integer n > 1 throughout. A similarity matrix 

is an n x n symmetric matrix Z with entries in the interval [0, 1], such that 
Zu = 1 for all i. A (probability) distribution is an n-tuple p = (pi, . . . ,p n ) 
with pi > for all i and J2i=i Pi = 1- 

Given a similarity matrix Z and a distribution p, thought of as a column 
vector, we may form the matrix product Zp (also a column vector), and we 
denote by (Zp)i its ith entry. 

Lemma 1.1 Let Z be a similarity matrix and p a distribution. Then pi < 
{Zp)i < 1 for all i G {1, . . . , n}. In particular, if Pi > then (Zp)i > 0. 

Proof We have 

n 

( z P)i = X! Zt i p 3 = Pt + /^2 Zt i p 3 - Pl 

and (Zp)i < E" =1 1ft = 1- □ 

Let Z be a similarity matrix and let g S [0, oo). The function is defined 
on distributions p by 



fff(p) 



^^(Zp)'- 1 ! if 5^1 

^ V i:pi>0 

Pilog(Zp), if q = 1. 

i:pi>0 



Lemma 1.1 guarantees that these definitions are valid. The definition in the 
case q = 1 is explained by the fact that H^(p) — lim g ^i H^(p) (easily shown 
using l'Hopital's rule). We call Hg the entropy of order q. 

Notes on the literature The entropies were introduced in this generality 
by Ricotta and Szeidl in 2006, as an index of the diversity of an ecological 
community [RS]. Think of n as the number of species, Zij as indicating the 
similarity of the ith and jth species, and pi as the relative abundance of the 
ith species. Ricotta and Szeidl used not similarities Z^ but dissimilarities or 
'distances' dy; the formulas above become equivalent to theirs on putting Z^ = 
1 - d^. 

The case Z = I goes back further. Something very similar to H^, using 
logarithms to base 2 rather than base e, appeared in information theory in 1967, 
in a paper of Havrda and Charvat [HC]. Later, the entropies Hg were discovered 
in statistical ecology, in a 1982 paper of Patil and Taillie [PT]. Finally, they 
were rediscovered in physics, in a 1988 paper of Tsallis [Tsa]. 

Still in the case Z = I, certain values of q give famous quantities. The 
entropy H[ is Shannon entropy (except that Shannon used logarithms to base 2). 
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The entropy H\ is known in ecology as the Simpson or GinL-Simpson index; it 
is the probability that two individuals chosen at random are of different species. 

For general Z, the entropy of order 2 is known as Rao's quadratic en- 
tropy [Rao]. It is usually stated in terms of the matrix with (i, j')-entry 1 — Zy, 
that is, the matrix of dissimilarities mentioned above. 

One way to obtain a similarity matrix is to start with a finite metric space 
{ai, . . . ,a n } and put Z t j = e~ d< ^ ai,a ^. Matrices of this kind arc investigated in 
depth in [Lei2] and other papers cited therein. Here, metric spaces will only 
appear in two examples (4.4 and 4.5). 

The maximum entropy problem Let Z be a similarity matrix and let q G 
[0, oo). The maximum entropy problem is this: 



For which distribution(s) p is H^(p) maximal, and what is the 
maximum value? 



The solution is given in Theorem 3.2. The terms used in the statement of 
the theorem will be defined shortly. However, the following striking fact can be 
stated immediately: 



There is a distribution maximizing for all q simultaneously. 



So even though the entropies of different orders rank distributions differently, 
there is a distribution that is maximal for all of them. 

For example, this fully explains the numerical coincidence noted in the Re- 
sults section of [AKB]. 

Restatement in terms of diversity Let Z be a similarity matrix. For each 
q G [0, oo), define a function on distributions p by 



These diversities were introduced informally in [Leil], and are explained and 
developed in [LC]. The case Z = I is well known in several fields: in information 




(P)) 



if q^l 
if q= 1 
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theory, log D ! q is called the Renyi entropy of order q [Ren] ; in ecology, D 1 is called 
the Hill number of order q [Hill]; and in economics, 1/D q is the Hannah-Kay 
measure of concentration [HK] . 

The transformation between H q and D q is invertible and order-preserving 
(increasing). Hence the maximum entropy problem is equivalent to the maxi- 
mum diversity problem: 



For which distribution(s) p is D q (p) maximal, and what is the 
maximum value? 



The solution is given in Theorem 3.1. It will be more convenient mathemati- 
cally to work with diversity rather than entropy. Thus, we prove results about 
diversity and deduce results about entropy. 

When stated in terms of diversity, a further striking aspect of the solution 
becomes apparent: 



There is a distribution maximizing D q for all q simultaneously. 
The maximum value of D q is the same for all q. 



So every similarity matrix has an unambiguous 'maximum diversity', the max- 
imum value of Dq for any q. 

A similarity matrix may have more than one maximizing distribution — but 
the collection of maximizing distributions is independent of q > 0. In other 
words, a distribution that maximizes D q for some q actually maximizes D q for 
all q (Corollary 4.1). 

The diversities D q are closely related to generalized means [HLP] , also called 
power means. Given a finite set /, positive real numbers {xi)i^i, positive real 
numbers (pi)i£i such that YliPi = 1j an d t <E R, the generalized mean of 
(xi) ie i, weighted by (pi)i E i, of order i, is 

IlV if * = 0. 

iei 

For example, if pi = pj for all i, j € I then the generalized means of orders 1, 
and —1 are the arithmetic, geometric and harmonic means, respectively. 

Given a similarity matrix Z and a distribution p, take I = {i £ 
{l,...,n} Pi > 0}. Then 1/D q (p) is the generalized mean of ((Zp)i)i e j, 
weighted by (pi)i £ /, of order q — 1. We deduce the following. 

Lemma 1.2 Let Z be a similarity matrix and p a distribution. Then: 
i. Dq (p) is continuous in q G [0, oo) 
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ii. if (Zp)i = (Zp)j for all i,j such that Pi,Pj > then (p) is constant 
over q G [0, oo); otherwise, D^{p) is strictly decreasing in q G [0, oo) 

Hi. lim^oo Dq(p) = 1/ max i:p!>0 (Zp) i . 

Proof All of these assertions follow from standard results on generalized means. 
Continuity is clear except perhaps at q = 1, where it follows from Theorem 3 
of [HLP]. Part (ii) follows from Theorem 16 of [HLP], and part (hi) from 
Theorem 4. □ 

In the light of this, we define 

Df,(p) = V max .(Zp)i. 

l: Pi>0 

There is no useful definition of H^, since lim^oo (p) = for all Z and p. 



2 Preparatory results 

Here we make some definitions and prove some lemmas in preparation for solving 
the maximum diversity and entropy problems. Some of these definitions and 
lemmas can also be found in [Lei2] and [LW]. 

Convention: for the rest of this work, unlabcllcd summations ^ are under- 
stood to be over all i G {1, . . . , n} such that pi > 0. 



Weightings and magnitude 

Definition 2.1 Let Z be a similarity matrix. A weighting on Z is a column 
vector w G R™ such that 



Zw = 




A weighting w is non-negative if Wi > for all i, and positive if wt > for 
all i. 

Lemma 2.2 Let w and x be weightings on Z. Then X};=i w i = S™=i Xi - 

Proof Write u for the column vector (1 • • • 1)', where ( )* means transpose. 
Then 

n n 

U 4 W = (Zx)'w = x'(Zw) = X*U = ^ Xi, 

i=l »=1 

using symmetry of Z . □ 

Definition 2.3 Let Z be a similarity matrix on which there exists at least one 
weighting. Its magnitude is \Z\ = Yl7=i w ii f° r an y weighting w on Z. 
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For example, if Z is invertible then there is a unique weighting w on Z, and 
Wi is the sum of the ith row of Z^ 1 . So then 

n 

\z\ = J2 

the sum of all n 2 entries of Z~ x . This formula also appears in [SP], [Shi] 
and [POP], for closely related reasons to do with diversity and its maximization. 

Weight distributions 

Definition 2.4 Let Z be a similarity matrix. A weight distribution for Z is 

a distribution p such that (Zp)i = ■ ■ ■ = (Zp) n . 

Lemma 2.5 Let Z be a similarity matrix. 

i. If Z admits a non-negative weighting then \Z\ > 0. 

ii. If~w is a non-negative weighting on Z then w/\Z\ is a weight distribution 
for Z , and this defines a one-to-one correspondence between non-negative 
weightings and weight distributions. 

Hi. If Z admits a weight distribution then Z admits a weighting and \Z\ > 0. 

iv. If p is a weight distribution for Z then (Zp)i = 1/\Z\ for all i. 

Proof 

i. Let w be a non-negative weighting. Certainly \Z\ = X)"=i Wi — 0- Since 
we are assuming that n > 1, the vector is not a weighting, so Wi > for 
some i. Hence \Z\ > 0. 

ii. The first part is clear. To see that this defines a one-to-one correspondence, 
take a weight distribution p, writing (Zp)i = K for all i. Since J^Pi = 1' 
we have pi > for some i, and then K = (Zp)i > by Lemma 1.1. The 
vector w = p/K is then a non-negative weighting. 

The two processes — passing from a non-negative weighting to a weight 
distribution, and vice versa — are easily shown to be mutually inverse. 

iii. Follows from the previous parts. 

iv. Follows from the previous parts. □ 

The first connection between magnitude and diversity is this: 

Lemma 2.6 Let Z be a similarity matrix and p a weight distribution for Z . 
Then D% (p) = \Z\ for all q G [0, oo]. 

Proof By continuity, it is enough to prove this for q ^ l,oo. In that case, 
using Lemma 2.5(iv), 

a?(p) = (E^p)r 1 ) 1 ^ = (Efti^i 1-4 )^ = |z| < 

as required. □ 



6 



Invariant distributions 

Definition 2.7 Let Z be a similarity matrix. A distribution p is invariant if 

£>f (p) - D§(p) for all q,q' G [0,oo]. 

Soon we will classify the invariant distributions. To do so, we need some 
more notation and a lemma. 

Given a similarity matrix Z and a subset B C {1, . . . , n}, let Zb be the 
matrix Z restricted to B, so that (Zs)ij = Zij G B). If B has to elements 
then Zb is an m x to matrix, but it will be more convenient to index the rows 
and columns of Zb by the elements of B themselves than by 1, . . . , to. 

We will also need to consider distributions on subsets of {1, ... ,n}. A dis- 
tribution on B C {1, . . . , n} is said to be invariant, a weight distribution, etc., if 
it is invariant, a weight distribution, etc., with respect to Zb- Similarly, we will 
sometimes speak of 'weightings on B\ meaning weightings on Zb- Distributions 
are understood to be on {1, . . . , n} unless specified otherwise. 

Lemma 2.8 Let Z be a similarity matrix, let B C {1, . . . , n}, and let r be a 
distribution on B. Write p for the distribution obtained by extending r by zero. 
Then D^ B (r) = (p) for all q G [0, oo]. In particular, r is invariant if and 
only if p is. 

Proof For i G B we have r\ = pi and {Zsv)i = (Zp)i. The result follows 
immediately from the definition of diversity of order q. □ 

By Lemma 2.6, any weight distribution is invariant, and by Lemma 2.8, any 
extension by zero of a weight distribution is also invariant. We will prove that 
these are all the invariant distributions there are. 

For a distribution p we write supp(p) = {i G {!,..., n) \ pi > 0}, the 
support of p. 

Let Z be a similarity matrix. Given 0^£>C{l,...,n} and a non- negative 
weighting w on Zb , let w be the distribution obtained by first taking the weight 
distribution w/\Zb\ on B, then extending by zero to {1, . . . ,n}. 

Proposition 2.9 Let Z be a similarity matrix and p a distribution. The fol- 
lowing are equivalent: 

i. p is invariant 

ii. {Zp)i = {Zp)j for all i,j G supp(p) 

Hi. p is the extension by zero of a weight distribution on a nonempty subset 
of {I, ...,n} 

iv. p = w for some non-negative weighting w on some nonempty subset of 
{l,...,n}. 
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Proof (i=Mi): Follows from Lemma 1.2. 

(ii=>iii): Suppose that (ii) holds, and write B = supp(p). The distribution 
p on {1, . . . , n} restricts to a distribution ronB. This r is a weight distribution 
on B, since for all i G B we have (Zer)j = (Zp)i, which by (ii) is constant over 
i G B. Clearly p is the extension by zero of r. 

(iii => i) : Suppose that p is the extension by zero of a weight distribution r 
on a nonempty subset B C {1, . . . , n}. Then for all q G [0, oo], 

D*(p)=D*°(r) = \Z B \ 

by Lemmas 2.8 and 2.6 respectively; hence p is invariant. 

(iii^iv): Follows from Lemma 2.5. □ 

There is at least one invariant distribution on any given similarity matrix. 
For we may choose B to be a one-element subset, which has a unique non- 
negative weighting w = (1), and this gives the invariant distribution w = 
(0,. .. ,0,1,0,. ..,0). 

Maximizing distributions 

Definition 2.10 Let Z be a similarity matrix. Given q G [0, oo], a distribution 
p is g-maximizing if D^(p) > (p') for all distributions p'. A distribution 
is maximizing if it is g-maximizing for all q G [0, oo]. 

It makes no difference to the definition of 'maximizing' if we omit q = oo; 
nor does it make a difference to either definition if we replace diversity by 
entropy H^. 

We will eventually show that every similarity matrix has a maximizing dis- 
tribution. 

Lemma 2.11 Let Z be a similarity matrix and p an invariant distribution. 
Then p is ^-maximizing if and only if it is maximizing. 

Proof Suppose that p is 0-maximizing. Then for all q € [0, oo] and all distri- 
butions p', 

D*fa) = D*(p)>D*(p')>Dfo% 
using invariance in the first step and Lemma 1.2 in the last . □ 

Lemma 2.12 Let Z be a similarity matrix and B C {1, . . . , n}. Suppose that 
sup r Dq b (r) > supp Dq (p), where the first supremum is over distributions r on 
B and the second is over distributions p on {1, . . . Suppose also that Zb 
admits an invariant maximizing distribution. Then so does Z . 

(In fact, sup r D^ B (r) < sup p £>f (p) in any case, by Lemma 2.8. So the '>' in 
the statement of the present lemma could equivalently be replaced by '='.) 
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Proof Let r be an invariant maximizing distribution on Zb- Define a distri- 
bution p on {1, . . . , n} by extending r by zero. By Lemma 2.8, p is invariant. 
Using Lemma 2.8 again, 

D z (p) = D^(r) = sup^(r') > sup£>*(p'), 

r' p' 

so p is O-maximizing. Then by Lemma 2.11, p is maximizing. □ 



Decomposition Let Z be a similarity matrix. Subsets B and B' of {1, . . . , n} 
are complementary (for Z) if B U B' = {1, . . . , n}, B n B' = 0, and Zw = 
for all i G B and i' G S'. For example, there exist nonempty complementary 
subsets if Z can be expressed as a nontrivial block sum 

X 
X' 

Given a distribution p and a subset B C {1, . . . , n} such that Pi > for 
some i G i?, let p|s be the distribution on B defined by 

/ 1 \ Pi 
(Pb 



Lemma 2.13 Let Z be a similarity matrix, and let B and B' be nonempty 
complementary subsets of {1, . . . , n}. Then: 

i. For any weightings v on Zb and v' on Zb> , there is a weighting w on Z 
defined by 

Vi if i G B 
v> ifieB'. 

ii. For any invariant distributions r on B and r' on B' , there exists an in- 
variant distribution p on {1, . . . , n} such that p\g = r and p\b' = r'- 

Proof 

i. For i G B, we have 

(Zw)i = ^ ZijVj + Zi 3 V 'j = ^2<( Z B)i3 V 3 = i Z Bv)i = 1. 

j£B j£B> j£B 

Similarly, (Zw)i = 1 for all i G B' . So w is a weighting. 

ii. By Proposition 2.9, r = v for some non-negative weighting v on some 
nonempty subset CC Jj. Similarly, r' = v' for some non-negative weight- 
ing v' on some nonempty C C B'. By (i), there is a non- negative weight- 
ing w on the nonempty set C U C defined by 



Wj 



Vi if i G C 
v' ; if i G C 
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Let p = w, a distribution on {1, . . . , n}, which is invariant by Proposi- 
tion 2.9. For i £ C we have 

f 1 \ Pi Wi/\Z C uc\ 

(P\B)i 



CISC' \ 



= r t . 

For i £ B\C we have 

( P \ B ) t = Pl =o = n. 

2~>jeB Pi 

Hence p|s = r. and similarly p|#' = r'. □ 

Lemma 2.14 Let Z be a similarity matrix, let B and B' be complementary 
subsets of {1, . . . , n}, and let p 6e a distribution on {1, . . . , n} suc/i £/ia£ pi > 
/or some i £ B and pi > /or some i <E B' . Then 

D^(p)=D^(p\ B ) + D^'(p\ B ,). 

Proof By definition, 



*p> = E<^- = £ £ (Zp) , 



Now for i e B, 



by definition of p\ B , and 



vies 



{Zp)i = Z VPJ + Z ijP0 = J2( ZB ^J P i = \zZ P n ( Z BP\s)f 

jGB j£B> j£B \j£B J 

Similar equations hold for B' , so 

o(p) ~ ^ (Z B p\ B )i ^ (Z B ,p\ B ,) t 

= D*°(p\ B ) + D*°'(p\ B ,), 

as required. □ 

Proposition 2.15 Let Z be a similarity matrix and let B and B' be nonempty 
complementary subsets of {1, ... ,n}. Suppose that Z B and Z B i each admit an 
invariant maximizing distribution. Then so does Z . 
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Proof Choose invariant maximizing distributions r on B and r' on B' . By 
Lemma 2.13(ii), there exists an invariant distribution p on {1, . . . , n} such that 
p\b = r and p\ B > = r'. I claim that p is maximizing. Indeed, let s be a 
distribution on {1, . . . , n}. If s» > for some i £ B and Si > for some i G B' 
then 

D*(s) = D* B (s| B ) + Dl°' (s|bO < ^ B (r) + (O 

by Lemma 2.14. If not then without loss of generality, Sj = for all t G 
then Si > for some z G B, and 

£ z (s) = ^ B (s| B ) < D^(r) < D^(r)+D^'(r>) 

by Lemma 2.8. So in any case we have 

D^s) < D*°(r)+D^'(r>) 

= D§ B (p\ B )+D^'(p\ B ,) 
= £>o(p)> 

using Lemma 2.14 in the last step. Hence p is O-maximizing, and by 
Lemma 2.11, p is maximizing. □ 

Positive definite similarity matrices The solution to the maximum di- 
versity problem turns out to be simpler when the similarity matrix is positive 
definite and satisfies certain further conditions. Here are some preparatory re- 
sults. They are not needed for the proof of the main theorem (3.1) itself, but 
will be used for the corollaries in Section 4. 

Lemma 2.16 Let Z be a positive definite similarity matrix. Then Z has a 
unique weighting and \Z\ > 0. 

Proof A positive definite matrix is invertible, so Z has a unique weighting w. 
By the definitions of magnitude and weighting, 



\Z\ = ^2wi = w'2w. 



But n > 1, so is not a weighting, so w / 0; then since Z is positive definite, 
w*Zw > 0. □ 

Lemma 2.17 Let Z be a positive definite similarity matrix. Then 



\Z\ = sup' 



x 4 Zx 



where the supremum is over all column vectors x ^ 0. The points at which 
the supremum is attained are exactly the nonzero scalar multiples of the unique 
weighting on Z . 
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Proof Since Z is positive definite, there is an inner product (— , — ) on R n 
denned by 

(x,y) =x t Zy 

(x, y G M"). The Cauchy-Schwarz inequality states that for all x,y£ R n , 

(x,x).(y,y) > (x,y) 2 

with equality if and only if one of x and y is a scalar multiple of the other. Let 
y be the unique weighting on Z. Then the inequality states that for all x £ R ra , 




Since y ^ 0, equality holds if and only if x is a scalar multiple of y. The result 
follows. □ 

A vector x is nowhere zero if xi ^ for all i. 

Proposition 2.18 Let Z be a positive definite similarity matrix and B C 
{l,...,n}. Then Zb is positive definite and \Zb\ < \Z\. The inequality is 
strict if B is a proper subset and the unique weighting on Z is nowhere zero. 

Proof Suppose without loss of generality that B = {1, . . . , m}, where < m < 
n. Let y be an m-dimensional column vector and write 

x = (yi,...,y m ,Q, ...,0)*. 

Then 

m n 
i,j =1 *,J=1 

and 

/ m \ 2 / n \ 2 

By (1) and positive dcfinitcness of Z, we have y'Zey > 0, with equality if and 
only if x = 0, if and only if y = 0. So Zb is positive definite. Then by (1), (2) 
and Lemma 2.17, \Z B \ < \Z\. 

Now suppose that m < n and the weighting w on Z is nowhere zero. The 
supremum in Lemma 2.17 is attained only at nonzero scalar multiples of w; in 
particular, any vector x at which it is attained satisfies x n 0. Let y be the 
unique weighting on Zb and let x be the corresponding n-dimensional column 
vector, as above. Since x n = 0, we have 

\y I Vi) 2 (J2i=l X i) 2 ^ I y\ 

\Zb\ = — ^ = ^ < \z\, 

y l Z B y r/x 

as required. □ 
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Lemma 2.19 Let Z be a similarity matrix with Zij < l/(n — 1) for all i =/= j. 
Then Z is positive definite, and the unique weighting on Z is positive. 

Proof Theorem 2 of [LW] shows that Z is positive definite. Now, for each 
% G {1, . . . , n} and r > 0, put 

Ci,r — ^ ^io'h ^ii'h ' ' ' ^V-iVi 

where the sum is over all io,...,i r G {1, . . . , n} such that io = i and i s —\ ^ i s 
whenever 1 < s < r. In particular, Cj,o = 1. Write 7 = maxj^k Zjk- Then for 
all r > 0, 

<k,r+i < Z. lail Z irl2 ■ ■ ■ Z lr _ lir -f = (n - 1)7 • c hr . (3) 

Hence Ci >r < ((n — l)7) r for all r > 0; and (n — 1)7 < 1, so the sum Wi := 
12^Lo(~^y Ci ' r converges. Again using (3), we have Ci, r+ i < Ci, r for all r, so 
w l > 0. 

It remains to show that w = (wi, . . . , w n ) is a weighting. Let i G {1, . . . , n}. 
Then 



(Zw)i = Wl + Zi 



j^Li r=0 j=j 07 t—jtj r 
oo 

= W i - H(-l) r+1 Ci,r+l 
r=0 

= Wi - (Wi - c ifi ) 

= 1, 

as required. □ 
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3 The main theorem 



Solution to the maximum diversity problem 

Theorem 3.1 (Main Theorem) Let Z be a similarity matrix. Then: 
i. For all q £ [0, oo], 

supDf(p) = max\Z B \ (4) 
p B 

where the supremum is over all distributions p and the maximum is over 
all subsets B C {l,...,n} such that Zb admits a non-negative weighting. 

ii. The maximizing distributions are precisely those of the form w, where w 
is a non-negative weighting on a subset B C {1, ...,n} such that \Zb\ 
attains the maximum (4). 

In particular, there exists a maximizing distribution, and the maximum diversity 
of order q is the same for all q £ [0, oo] . 

For the definitions, including that of 'maximizing distribution', see Section 2. 
The proof is given later in this section. First we make some remarks on compu- 
tation and on maximum entropy. 

The maximum diversity of a similarity matrix Z is D max (Z) := 
supp Dq (p), which by Theorem 3.1 is independent of the value of q £ [0, oo]. 

Remarks on computation Suppose that we are given a similarity matrix 
Z and want to compute its maximizing distribution(s) and maximum diversity. 
The theorem gives the following algorithm. For each of the 2" subsets B of 
{l,...,n}: 

• perform some simple linear algebra to decide whether Zb admits a non- 
negative weighting 

• if it does, tag B as 'good' and record the magnitude \Zb\ (the sum of the 
entries of any weighting). 

The maximum of all the recorded magnitudes is the maximum diversity 
DmzxiZ). For each good B such that \Zb\ = D max (Z), find all non-negative 
weightings w on Zb', the corresponding distributions w are the maximizing 
distributions. 

This algorithm takes exponentially many steps. However, each step is fast, 
so it might be possible to handle reasonably large values of n in a reasonable 
length of time. Moreover, the results of Section 4 may allow the speed of the 
algorithm to be improved. 
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Solution to the maximum entropy problem We can translate the solu- 
tion to the maximum diversity problem into a solution to the maximum entropy 
problem. The first part, giving the value of the maximum, becomes more com- 
plicated. The second part, giving the maximizing distribution(s), is unchanged. 

Theorem 3.2 Let Z be a similarity matrix. Then: 

i. For all q E [0, 00), 

suptf>)={" , "»p< i r |ZBr,) (5) 

p q { max B log|Z B | »/<?=! 

where the supremum is over all distributions p and the maxima are over 
all subsets B C {1, . . . ,n} such that Zg admits a non-negative weighting. 

ii. The maximizing distributions are precisely those of the form w, where w 
is a non-negative weighting on a subset B C {1, . . . ,n} such that \Zb\ is 
maximal among all subsets admitting a non-negative weighting. 

In particular, there exists a maximizing distribution. 



Proof This follows almost immediately from Theorem 3.1, using the definition 
in terms of 



of Dq in terms of . Note that on the right-hand side of (5), the expressions 



— \Zb\ 1 q ) and log|Z B | are increasing, injective functions of \Zb\, so a 
subset B maximizes any one of them if and only if it maximizes | Zb \ ■ □ 

The part of Theorem 3.1 stating that the maximum diversity of order q is 
the same for all values of q has no clean statement in terms of entropy. 

Diversity of order zero Our proof of Theorem 3.1 will depend on an analysis 
of the function Dq , diversity of order zero. The first step is to find its critical 
points, and for that we need a technical lemma. 

Lemma 3.3 Let m > 1, let Y be an m x m real skew- symmetric matrix, and 
let x <E (0,oo) Tn . Suppose that Y^ > whenever i > j and that J2j=i^ij x i is 
independent of i G {1, . . . , to}. Then Y = 0. 

Proof This is true for m = 1; suppose inductively that m > 2. We have 

m rn 

i=i i=i 

with YijXj = —YjiXj < and Y m jXj > for all j; hence both sides are and 
Y m jXj = for all j. So for all j we have Y m j = (since Xj > 0) and Yj m = (by 
skew-symmetry). Let Y 1 be the (m — 1) X (to — 1) matrix defined by Y^ = Yij. 
Then Y 1 satisfies the conditions of the inductive hypothesis, so Y' = 0; that is, 
Y^ = whenever i,j < m. But we already have Yij = whenever i = m or 
j = in, so Y = 0, completing the induction. □ 
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Write 

A„ = { P eK"| 5^Pi = l, Pi>0} 
for the space of distributions, and 

A° ={ P GK n | ^>i = l, Pi>0} 

for the space of nowhere-zero distributions. The function Dq on A„ is given by 

(Recall the standing convention that unlabelled summations are over all i G 
{1, . . . , n} such that pi > 0.) It can be defined, using the same formula, for all 
p G [0,oo)™. It is then diffcrcntiable on (0,oo)™, where the summation is over 
all £ G {1, ... , n). 

Proposition 3.4 Let Z be a similarity matrix and p G A° . Then p is a critical 
point of Dq on A° if and only if for all i,j G {1, . . . , n}, 

Zn > => (Zp)i = (Zp)j. 

Proof We find the critical points of Dq on A° using Lagrange multipliers 
and the fact that A° is the intersection of (0, 00)™ with the hyperplane {p G 
R" I EPi = 1}- Write h{p) 

For k,i G {1, . . . ,n} and p G (0, 00)" we have g^(Zp) l = Z ifc , giving 

^ \v ha/ __^_^z ik pi otherwise. 

From this and symmetry of Z we deduce that for k G {1, . . . , n} and p G (0, oo) ra , 



where 



wAf(p) = E F ^ 

2—1 

/ 1 1 



J ki- 



On the other hand, 



\(Zp)l {Zp)X 

^-Kp) = 1 

Opk 

for all k. A point p G A° is a critical point of Dq on A° if and only if there 
exists a scalar A such that (VJ3g )(p) = A(V/i)(p). Hence p is critical if and only 
if YkiPi is independent of k G {1, . . . , n}. So the proposition is equivalent 
to the statement that, for p G A° , the sum £ YuPi is independent of k if and 
only if the matrix Y is 0. 
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The 'if direction is trivial. Conversely, suppose that J2i YkiPi is independent 
of fc. Assume without loss of generality that (Zp)i > • • • > (Zp) n . Then 
Lemma 3.3 applies (taking x = p), and Y = 0. □ 

Let ~ be the equivalence relation on {1, . . . , n} generated by i ~ j whenever 
Zij > 0. Thus, i j if and only if there is a chain i = ii,i2, ■ ■ ■ , V-i, *V = j 
with Zi u i t+1 > for all t. Call Z connected if i ~ j for all i, j. 

Corollary 3.5 Lei Z be a connected similarity matrix. Then every critical 
point of Dq in A° is a weight distribution. □ 

We are aiming to show, among other things, that there is a 0-maximizing 
distribution: the function Dq on A„ attains its supremum. Its supremum 
is finite, since by Lemma 1.1, £)jf(p) < n for all distributions p. If Dq is 
continuous on A n then it certainly does attain its supremum. But in general, 
it is not. For example, if Z = I then Djf (p) is the cardinality of the support 
of p, which is not continuous in p. We must therefore use another argument 
to establish the existence of a 0-maximizing distribution. The following lemma 
will help us. 

Lemma 3.6 Let Z be a connected similarity matrix and let (p fc )fc g N be a se- 
quence in A„. Then (p fe )fceN has a subsequence (p k )kes satisfying at least one 
of the following conditions: 

i. there is some i G {1, . . . , n} such that p k = for all k E S 

ii. the subsequence (p k )kes lies in A° and converges to some point of A° 

Hi. the subsequence (p k )k£S ^*es in A° , and there is some i G {1, . . . , n} such 
that lim fces (p k /{Zp%) = 0. 

Here and in what follows, we treat sequences as families (xk)keT indexed over 
some infinite subset T of N. A subsequence of such a sequence therefore amounts 
to an infinite subset of T. 

Proof If there exist infinitely many pairs (k,i) G N x {l,...,n} such that 
p\ = then there is some i G {1, . . . , n} such that {k G N | p k = 0} is infinite. 
Taking S = {k G N | p k = 0} then gives condition (i). 

Suppose, then, that there are only finitely many such pairs. We may choose 
a subsequence (p k )keQ of (p k )k£N lying in A°. Further, since A„ is compact, 
we may choose a subsequence {p k )keR 01 (p k )keQ converging to some point 
p G A n . If p G A° then {p k )keR satisfies (ii). 

Suppose, then, that p g" A° ; say pi — where £ G {1, . . . , n}. 

Define a binary relation < on {1, . . . , n} by i < j if and only if (p k /p k )keR 
is bounded (that is, bounded above). Then < is reflexive and transitive, and if 
i 53 j and pj = then pi =0. Write i ~ j for i < j < i; then ps is an equivalence 
relation. 
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I claim that there exist i,j G {1, . . . , n} with > and i ^ j. For if not, 
the equivalence relation w satisfies Zy > i ~ j, and since Z is connected, 
i « j for all «, j. But then i < £ for all i, and = 0, so Pi = for all i. This 
contradicts p being a distribution, proving the claim. 

Now without loss of generality, Z\ n > and 1 ^ n. So (Pi/Pn)keR 1S 
unbounded. We may choose an infinite subset S C R such that hmfe e s(Pi/Pn) = 
oo. For all fc G S we have 

(^P fc )n > Z nl p\ 
Pn ~ Pn 

with Z n i = Zi n > 0, so 

lim [ - — %— — | = oo, 

and condition (iii) follows. □ 



Existence of a maximizing distribution At the heart of Theorem 3.1 is 
the following result, from which we will deduce the theorem itself. 

Proposition 3.7 Every similarity matrix has a maximizing distribution, and 
every maximizing distribution is invariant. 

Proof Let Z be a similarity matrix. It is enough to prove that Z admits an 
invariant maximizing distribution: for if p and p' are both maximizing then 
Dq (p) = Dq(p') for all q, so p is invariant if and only if p' is. 

The result holds for n — 1. Suppose inductively that n > 2. 

Case 1: Z is not connected. We may partition {1, . . . , n} into two nonempty 
subsets, B and B' , each of which is a union of ^-equivalence classes (where 
~ is as defined before Corollary 3.5). Then B and B' arc complementary, 
and by inductive hypothesis, Zb and Zb> each admit an invariant maximizing 
distribution. So by Proposition 2.15, Z admits one too. 

Case 2: Z is connected. Write a = sup p Dq (p). We may choose a sequence 
(p fe )fceN in A„ with limfe->oo Dq (p fc ) = a. By Lemma 3.6, at least one of the 
following three conditions holds. 

i. There is a subsequence (p fe )fc 6 s such that (without loss of generality) 
•p\ = for all k G S. Write B = {1, . . . , n— 1}. Define a sequence (r k )k^s 
in A„_i by 

r fc = (pf,..., p^). 

Then for all k G S wc have Dq b (r fe ) = Dq (p k ) (by Lemma 2.8), so 
sup fcgS Dq b (r fc ) = a. Then by Lemma 2.12 and inductive hypothesis, Z 
admits an invariant maximizing distribution. 

ii. There is a subsequence (p k )kes m A° convergent to some point p G A°. 
Since Dq is continuous on A°, 

D§(p) = limD*(p k ) = o-. 
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So p is O-maximizhig. Now p is a critical point of Dq on A°, and Z is 
connected, so by Corollary 3.5, p is a weight distribution. By Lemma 2.6, 
p is invariant; then by Lemma 2.11, p is maximizing. 

hi. There is a subsequence (p k )kes m A° such that (without loss of generality) 
limfcgs (p k /(Zp k ) n ^j = 0. Write B = {1, . . . , n — 1}. Define a sequence 
(r k )kes in &n-l by rk = P fe |s (which is possible because p k £ A° and 
n > 2). Then for all k £ S and i £ B, 

f^y p^y p^y p^y 

{ZBT k )i = y™zI z tjP k = (zp k h - z inP * ~ (Zpfy 

Hence for all k E S, 

n — It, n—lu u 

r>ZB (r k-s _ ST r * > V Pi - D z (r> k ) - Pn 

But 

fe(^-(^)=*-° = *' 

so sup fceS D^ B (r k ) > a. Then by Lemma 2.12 and inductive hypothesis, 
Z admits an invariant maximizing distribution. 

So in all cases there is an invariant maximizing distribution, completing the 
induction. □ 



Proof of the Main Theorem, 3.1 

i. Let q £ [0, oo]. By Proposition 3.7, the supremum sup p _D^(p) is un- 
changed if p is taken to run over only the invariant distributions. By 
Proposition 2.9, any invariant distribution is of the form w for some non- 
negative weighting w on some nonempty subset B C {1, . . . , n}. Hence 

sup£>f(p)=max J Df(w) 

where the maximum is over all nonempty B and non-negative weightings 
wonfi. But for any such B and w we have 

D%(w)=DZ»(w/\Z B \) = \Z B \ 

by Lemmas 2.8 and 2.6 respectively. Hence 

sup£>^(p) = maximal 
p B 

where the maximum is now over all nonempty B C {1, . . . ,n} such that 
there exists a non- negative weighting on Zb- And since |0| = 0, it makes 
no difference if we allow B to be empty. 

ii. Any maximizing distribution is invariant, by Proposition 3.7. The result 
now follows from Proposition 2.9. □ 
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4 Corollaries and examples 



Here we state some corollaries to the results of the previous section. The first 
is a companion to Lemma 2.11. 

Corollary 4.1 Let Z be a similarity matrix and q £ (0, oo] . Then a distribution 
is q-maximizing if and only if it is maximizing. 

In other words, if a distribution is q-maximizing for some q > then it is 
q-maximizing for all q > 0. The proof is below. 

However, a 0-maximizing distribution is not necessarily maximizing. Take 
Z = /, for example. Then D z (p) = J2i Pi >o^ = cardinality of supp(p), so 
any nowhere-zero distribution is 0-maximizing. On the other hand, only the 
uniform distribution (1/n, . . . , 1/n) is maximizing. So the restriction q ^= 
cannot be dropped from Corollary 4.1, nor can the word 'invariant' be dropped 
from Lemma 2.11. 

Proof Let p be a q-maximizing distribution. Then 

D z (p) = D max (Z) > D z (p) > D z (p), 

where the second inequality is by Lemma 1.2. So we have equality throughout, 
and in particular D z (p) = D z (p). But q > 0, so by Lemma 1.2, p is invariant. 
Hence for all q' £ [0, oo], 

D z (p) = D z (p)=D max {Z), 

and therefore p is maximizing. □ 

The importance of Corollary 4.1 is that if one has solved the problem of 
maximizing entropy or diversity of any particular order q > 0, then one has 
solved the problem of maximizing entropy and diversity of all orders. In the 
following example, we observe that for a certain class of similarity matrices, 
the problem of maximizing the entropy of order 2 has already been solved in 
the literature; we can immediately deduce a more general maximum entropy 
theorem. 

Example 4.2 Let G be a finite reflexive graph. Thus, G consists of a finite 
set {1, . . . , n} of vertices equipped with a reflexive symmetric binary relation E. 
Such graphs correspond to similarity matrices Z whose entries are all or 1, 
taking Zij = 1 if G E and Zij = otherwise. 

For each q £ [0, oo] and distribution p, define D q (p) = D z (p), the diver- 
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sity of order q of p with respect to G. Thus, 




if q 7^ l,oo 




(P)={ 



n [ e p) 



if q = 1 



i: Pl >0 \j: (i,j)£E J 




if q = 00. 



A set i'C C {1, ...,n} of vertices of G is discrete if £ E whenever 

i,j G .K" with i 7^ j. Write d(G) for the largest integer d such that there exists a 
discrete set in G of cardinality d. Also, given any nonempty set B C {1, . . . , n}, 
write p B for the distribution 



and the supremum is attained at p K for any discrete set K of cardinality d(G). 

Proof: We use the following result of Berarducci, Majer and Novaga [BMN]. 
Let G' be a finite irreflexive graph with n vertices, that is, an irreflexive sym- 
metric binary relation E' on {1, . . . , n}. A set K of vertices of G' is a clique 
(or complete subgraph) if G E' whenever i,j G K with i 7^ j. Write c(G') 
for the largest integer c such that there exists a clique in G' of cardinality c. 
Their Proposition 4.1 states that 



(which they call the 'capacity' of G'). Their proof shows that the supremum is 
attained at p K for any clique K of cardinality c(G'). 

We are given a graph G. Let G' be its dual graph, with the same vertex-set 
and with edge-relation E' defined by G E' if and only if E. Then G' 

is irreflexive, a clique in G' is the same as a discrete set in G, and c(G') = d(G). 
For any distribution p, 




Claim: For all q G [0, 00], 



sup-D^ 



(P) 



d(G), 



p 




PtPj = 1 - XI PiPj = Hi{p). 

(td)eE' (i,j)eE 
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Let K be a clique in G of maximal cardinality, that is, a discrete set in G of 
maximal cardinality. Then by [BMN], p K is 2-maximizing and 

But it is a completely general fact that 

# 2 Z (P) = 1 tt-: 

for all p, directly from the definitions in Section 1. Hence d(G) = Df (p^) = 
Z?2 (p ), and the claim holds for q = 2. Corollary 4.1 now tells us that the 
claim holds for all q £ [0, oo]. 

This class of examples tells us that a similarity matrix may have several 
different maximizing distributions, and that a maximizing distribution p may 
have Pi = for some values of i. These phenomena have been observed in the 
ecological literature in the case q = 2 (Rao's quadratic entropy): see Pavoine 
and Bonsall [PB] and references therein. 

Computing the maximum diversity is potentially slow, because in principle 
one has to go through all 2™ subsets of {1, . . . , n). But if the similarity matrix 
satisfies some further conditions, a maximizing distribution can be found very 
quickly: 

Corollary 4.3 Let Z be a positive definite similarity matrix whose unique 
weighting w is non-negative. Then D max (Z) = \Z\. Moreover, w/\Z\ is a 
maximizing distribution, and if w is positive then it is the unique such. 

Proof Follows immediately from Proposition 2.18 and Theorem 3.1. □ 

Example 4.4 A similarity matrix Z is ultrametric if mm{Zij, Zjk] < Zik for 
all k and Zij < 1 for all i ^ j. As shown below, every ultrametric matrix is 
positive definite and its weighting is positive. Hence its maximum diversity is 
its magnitude, it has a unique maximizing distribution, and that distribution is 
nowhere zero. 

Ultrametric matrices are closely related to ultrametric spaces, that is, 
metric spaces satisfying a stronger version of the triangle inequality: 

max{c?(a, b), d(b, c)} > d{a, c) 

for all points a, b, c. Any finite metric space A = {ai, . . . , a n } gives rise to a 
similarity matrix Z by putting Zij = e~ d ( ai ' aj ' ) , and if the space A is ultrametric 
then so is the matrix Z. 

Ultrametric matrices also arise in the quantification of biodiversity. Take a 
collection of n species, and suppose, for example, that we choose a taxonomic 
measure of species similarity: 



Zij 



1 if i = j 

0.8 if i 7^ j but the ith and jth species are of the same genus 

0.6 if the ith and jth species are of different genera but the same family 

otherwise. 
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This is an ultrametric matrix, so is guaranteed to have a unique maximizing 
distribution. That distribution is nowhere zero: maximizing diversity does not 
eradicate any species. The same conclusion for general ultrametric matrices was 
reached, in the case q — 2, by Pavoinc, Oilier and Pontier [POP]. 

We now prove that any ultrametric matrix Z is positive definite with pos- 
itive weighting. That Z is positive definite was also proved by Varga and 
Nabben [VN], and that the weighting is positive was also proved in [POP]. 
The following proof, which is probably not new either, seems more direct. 

If n = 1 then certainly Z is positive definite and its weighting is positive. 

Suppose inductively that n > 2. Write z = min^.j Zij < 1. By the ultramet- 
ric property, there is an equivalence relation ~ on {1, . . . , n} defined by i ~ j if 
and only if Zij > z. We may partition {1, . . . , n} into two nonempty subsets, B 
and B' , each of which is a union of ^-equivalence classes; and without loss of 
generality, B = {1, . . . , m} and B' = {m + 1, . . . , n}, where 1 < m < n. For all 
i < m and j > m + 1 we have Z^ < z, that is, Z^ = z. Hence 



Z = 



Y 



zU n ~ 
Y' 



where Y is some m x m matrix, Y' is some (n — m) x (n — to) matrix, and 
U^. denotes the k x I matrix all of whose entries are 1. Now Y and Y' are 
ultrametric with entries in [z, 1], so the matrices 



J—(Y zU%), X' = J-(Y'- zU£Z) 



1 



1 



are also ultrametric. By inductive hypothesis, X and X' are positive definite 
and their respective weightings are positive. 
We have 

Z = zU2 + (l-z)(* (6) 

The matrix U™ is positivc-semidefinite, since x'C/^x = (x\ + • • • + x n ) 2 for all 

x e M". Also ^ ^ J is positive definite, since X and X' are. Finally, z > 

and 1 — z > 0. It follows that Z is positive definite. 
Write v and v' for the weightings on X and X' . Put 

/ vi \ 



En—m i 
3=1 V J 



+ (!"*) 



The weightings v and v' are positive and < z < 1, so w is positive. And it is 
routine to verify, using (6), that w is the weighting on Z. 
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Example 4.5 Take a metric space with three points, eti, a 2 , a 3 , and put Zy- = 
e -d(a,i,aj) _ phis defines a 3 x 3 similarity matrix Z with Zy < 1 for all i ^ j and 
ZijZjk < Zifc for all i,j,k. We will show that Z is positive definite and that 
its unique weighting is positive. It follows that there is a unique maximizing 
distribution and that the maximum diversity is \Z\. We give explicit expressions 
for both. 

First, Sylvester's Criterion states that a symmetric real n x n matrix is 
positive definite if and only if for all m <G {l,...,n}, the upper- left m x m 
submatrix has positive determinant. In this case: 

• the upper- left lxl matrix is (1), which has determinant 1 

• the upper-left 2x2 matrix is [ „ i* 2 ) i which has determinant 1 — 

\Zl2 I J 

Z\ 2 > 

• the upper-left 3x3 matrix is Z itself, and 

det Z = 1 - {Z\ 2 + Z 2 2 3 + Zf x ) + 2Z 12 Z 23 Z 31 

= (1 - Z 12 )(l - Z 23 )(l - Z 31 ) + (1 - Z 12 )(Z 12 - Z 13 Z 32 ) 
+ (1 — Z 2 s)(Z 2 3 — Z 2 \Z\z) + (1 — Z-i\){Zz\ — Z32Z21) 
> 0. 

Hence Z is positive definite. Next, it is easily checked that the unique weighting 
w is given by w = v/ det Z, where, for instance, 

V\ = 1 — (Z12 + Z13) + (Z13Z32 + Z12Z23) — Z23 

= (1 - Zia)(l - Z 23 )(l - Z 3 i) + (1 - Z 23 )(Z23 - Z21Z13) 
> 0. 

Since det Z > 0, the weighting w is positive. 

The maximum diversity is \Z\ = wi + w 2 + W3, which is 

1 , 2(1-Z 12 )(l-Z 23 )(l-Z 3 i) 



1 ~ (Zi 2 + Z 23 + Z 31 ) + 2Z12Z23Z31 



(This expression was pointed out to me by Simon Willerton.) The unique max- 
imizing distribution p is given by p = w = w/\Z\ = v/(|Z| det Z), so 

_ 1 — (Z12 + Z13) + (Z13Z32 + Z12Z23) — Z| 3 

Pl ~ 1 - {Z\ 2 + Z| 3 + Zl) + 2Z 12 Z 23 Z 31 + 2(1 - Z 12 )(l - Z 23 )(l - Z31) 

and similarly for p 2 and p 3 . 

Example 4.2 (graphs) shows that maximizing distributions sometimes con- 
tain some zero entries. In ecological terms this means that diversity is sometimes 
maximized by completely eradicating certain species, which may be contrary to 
acceptable practice. For this and other reasons, we might seek conditions under 
which some or all of the maximizing distributions p satisfy pi > for all i. 
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Corollary 4.6 Let Z be a similarity matrix such that Zij < 1 / (n — 1) for all 
i j. Then D max (Z) = \Z\. Moreover, Z has a unique weighting w, the unique 
maximizing distribution is p = -w/\Z\, and pi > for all i. 

Proof By Lemma 2.19, Z is positive definite and its unique weighting is posi- 
tive. Then apply Corollary 4.3. □ 

The extra hypothesis on Z is strong, possibly too strong for the corollary 
to be of any use in ecology: when n is large, it forces Z to be very close to 
the identity matrix. On the other hand, the ecological interpretation of Corol- 
lary 4.6 is clear: if we treat every species as highly dissimilar to every other, the 
distribution that maximizes diversity conserves all of them. 
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